Salesforce Data Cloud Ingestion from Sitemaps - Implementation Template

(0 reviews)

Application details

Technical considerations

  • An instance of the Mule application is deployed per domain
  • Support discovery of Sitemaps from the organization’s robots.txt file
  • Processing Sitemap index files is out of scope
  • Content from a Sitemap should generally be provided as "text/html"
  • No authentication is required/supported
  • Synchronous and Asynchronous scans will ingest the full load
  • The Mule application is designed to be stateless

Activity diagrams

The following activity diagrams illustrate the sequence of processing to ingest the unstructured metadata and its content on-demand.

Initial Load/Full Refresh Synchronous

sdc-ingest-sitemaps-full-refresh.png

Initial Load/Full Refresh Asynchronous

sdc-ingest-sitemaps-full-refresh-async.png

Get Content

sdc-ingest-sitemaps-retrieve-content.png

Processing logic

The primary handling and orchestration of unstructured metadata ingestion will be implemented in the Salesforce Data Cloud Ingestion from the Sitemaps Process API. This process is described in more detail in the following sections.

Initial Load/Full Refresh Synchronous

  1. A user action from Data Cloud initiates the request for a full refresh of the content metadata
  2. Data Cloud invokes the Mule application without a continuation token to start the process
  3. Mule application receives the request and will:
    • Retrieve the content metadata from all the configured organizations' Sitemaps
    • Transform the results into the Data Cloud format and return the results

Initial Load/Full Refresh Asynchronous

  1. Mule application receives a request to perform an asynchronous refresh of all metadata and will:
    • Retrieve the content metadata from all the configured organizations' Sitemaps
    • Transform the results into the required format for the ingestion API
    • Send the transformed data to the ingestion endpoint

Get Content

  1. Data Cloud initiates the request to retrieve the content
  2. Mule application receives the request to retrieve and stream the page content from a Sitemap

Success conditions

Upon successful completion, the following conditions will be met:

  • All metadata associated with unstructured content in the organization's Sitemaps is retrieved and processed.
  • The full load of metadata is retrieved on demand.
  • Retrieval of content is supported.

Reviews

TypeTemplate
OrganizationMuleSoft
Published by
MuleSoft Solutions
Published onNov 21, 2024
Asset overview

Asset versions for 1.0.x

Asset versions
VersionActions
1.0.11
1.0.10
1.0.9